A comprehensive evaluation of multicategory classification methods for microbiomic data

نویسندگان

  • Alexander Statnikov
  • Mikael Henaff
  • Varun Narendra
  • Kranti Konganti
  • Zhiguo Li
  • Liying Yang
  • Zhiheng Pei
  • Martin J Blaser
  • Constantin F Aliferis
  • Alexander V Alekseyenko
چکیده

BACKGROUND Recent advances in next-generation DNA sequencing enable rapid high-throughput quantitation of microbial community composition in human samples, opening up a new field of microbiomics. One of the promises of this field is linking abundances of microbial taxa to phenotypic and physiological states, which can inform development of new diagnostic, personalized medicine, and forensic modalities. Prior research has demonstrated the feasibility of applying machine learning methods to perform body site and subject classification with microbiomic data. However, it is currently unknown which classifiers perform best among the many available alternatives for classification with microbiomic data. RESULTS In this work, we performed a systematic comparison of 18 major classification methods, 5 feature selection methods, and 2 accuracy metrics using 8 datasets spanning 1,802 human samples and various classification tasks: body site and subject classification and diagnosis. CONCLUSIONS We found that random forests, support vector machines, kernel ridge regression, and Bayesian logistic regression with Laplace priors are the most effective machine learning techniques for performing accurate classification from these microbiomic data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis

MOTIVATION Cancer diagnosis is one of the most important emerging clinical applications of gene expression microarray technology. We are seeking to develop a computer system for powerful and reliable cancer diagnostic model creation based on microarray data. To keep a realistic perspective on clinical applications we focus on multicategory diagnosis. To equip the system with the optimum combina...

متن کامل

One-against-all multicategory classification via discrete support vector machines

Discrete support vector machines (DSVM), recently proposed in [l01 and [ l l ] for binary classification problems, have been shown to outperform other competing approaches on well-known benchmark datasets. Here we address their extension to multicategory classification, by developing a one-against-all framework in which a set of binary discrimination problems are solved by means of DSVM. Comput...

متن کامل

Matched Gene Selection and Committee Classifier for Molecular Classification of Heterogeneous Diseases

Microarray gene expressions provide new opportunities for molecular classification of heterogeneous diseases. Although various reported classification schemes show impressive performance, most existing gene selection methods are suboptimal and are not well-matched to the unique characc ©2010 Guoqiang Yu, Yuanjian Feng, David J. Miller, Jianhua Xuan, Eric P. Hoffman, Robert Clarke, Ben Davidson,...

متن کامل

Reinforced Multicategory Support Vector Machines

Support vector machines are one of the most popular machine learning methods for classification. Despite its great success, the SVM was originally designed for binary classification. Extensions to the multicategory case are important for general classification problems. In this article, we propose a new class of multicategory hinge loss functions, namely reinforced hinge loss functions. Both th...

متن کامل

Sparse partial least squares classification for high dimensional data.

Partial least squares (PLS) is a well known dimension reduction method which has been recently adapted for high dimensional classification problems in genome biology. We develop sparse versions of the recently proposed two PLS-based classification methods using sparse partial least squares (SPLS). These sparse versions aim to achieve variable selection and dimension reduction simultaneously. We...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2013